Simplex Algorithm for Countable-State Discounted Markov Decision Processes

نویسندگان

  • Ilbin Lee
  • Marina A. Epelman
  • H. Edwin Romeijn
  • Robert L. Smith
چکیده

We consider discounted Markov Decision Processes (MDPs) with countably-infinite statespaces, finite action spaces, and unbounded rewards. Typical examples of such MDPs areinventory management and queueing control problems in which there is no specific limit on thesize of inventory or queue. Existing solution methods obtain a sequence of policies that convergesto optimality in value but may not improve monotonically, i.e., a policy in the sequence maybe worse than preceding policies. Our proposed approach considers countably-infinite linearprogramming (CILP) formulations of the MDPs (a CILP is defined as a linear program (LP)with countably-infinite numbers of variables and constraints). Under standard assumptions foranalyzing MDPs with countably-infinite state spaces and unbounded rewards, we extend themajor theoretical extreme point and duality results to the resulting CILPs. Under an additionaltechnical assumption which is satisfied by several applications of interest, we present a simplex-type algorithm that is implementable in the sense that each of its iterations requires only afinite amount of data and computation. We show that the algorithm finds a sequence of policieswhich improves monotonically and converges to optimality in value. Unlike existing simplex-typealgorithms for CILPs, our proposed algorithm solves a class of CILPs in which each constraintmay contain an infinite number of variables and each variable may appear in an infinite numberof constraints. A numerical illustration for inventory management problems is also presented.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Accelerated decomposition techniques for large discounted Markov decision processes

Many hierarchical techniques to solve large Markov decision processes (MDPs) are based on the partition of the state space into strongly connected components (SCCs) that can be classified into some levels. In each level, smaller problems named restricted MDPs are solved, and then these partial solutions are combined to obtain the global solution. In this paper, we first propose a novel algorith...

متن کامل

Compactness of the space of non-randomized policies in countable-state sequential decision processes

For sequential decision processes with countable state spaces, we prove compactness of the set of strategic measures corresponding to nonrandomized policies. For the Borel state case, this set may not be compact [14, p. 170] in spite of compactness of the set of strategic measures corresponding to all policies [17,2]. We use the compactness result from this paper to show the existence of optima...

متن کامل

Asymptotic properties of constrained Markov Decision Processes

We present in this paper several asymptotic properties of constrained Markov Decision Processes (MDPs) with a countable state space. We treat both the discounted and the expected average cost, with unbounded cost. We are interested in (1) the convergence of nite horizon MDPs to the innnite horizon MDP, (2) convergence of MDPs with a truncated state space to the problem with innnite state space,...

متن کامل

Denumerable Constrained Markov Decision Processes and Finite Approximations

The purpose of this paper is two fold. First to establish the Theory of discounted constrained Markov Decision Processes with a countable state and action spaces with general multi-chain structure. Second, to introduce nite approximation methods. We deene the occupation measures and obtain properties of the set of all achievable occupation measures under the diierent admissible policies. We est...

متن کامل

Continuous Time Markov Decision Processes with Expected Discounted Total Rewards

Abstract. This paper discusses continuous time Markov decision processes with criterion of expected discounted total rewards, where the state space is countable, the reward rate function is extended real-valued and the discount rate is a real number. Under necessary conditions that the model is well defined, the state space is partitioned into three subsets, on which the optimal value function ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Operations Research

دوره 65  شماره 

صفحات  -

تاریخ انتشار 2017